Problem Set 2 Fall 2023

Author

Alexandra Kakadiaris (UNI:ak5087) and Xinran She (UNI: xs2518)

Note: Grading is based both on your graphs and verbal explanations. Follow all best practices as discussed in class, including choosing appropriate parameters for all graphs. Do not expect the assignment questions to spell out precisely how the graphs should be drawn. Sometimes guidance will be provided, but the absense of guidance does not mean that all choices are ok.

1. Netflix

[10 points]

Data: netflix.csv

# Import packages needed  
library(readr) 
library(ggplot2) 
library(dplyr) 

# Import netflix csv file into r 
netflix <- read_csv("netflix.csv") 
# Make sure imported correctly 
head(netflix)
# A tibble: 6 × 12
  show_id type    title    director cast  country date_added release_year rating
  <chr>   <chr>   <chr>    <chr>    <chr> <chr>   <chr>             <dbl> <chr> 
1 s1      Movie   Dick Jo… Kirsten… <NA>  United… September…         2020 PG-13 
2 s2      TV Show Blood &… <NA>     Ama … South … September…         2021 TV-MA 
3 s3      TV Show Ganglan… Julien … Sami… <NA>    September…         2021 TV-MA 
4 s4      TV Show Jailbir… <NA>     <NA>  <NA>    September…         2021 TV-MA 
5 s5      TV Show Kota Fa… <NA>     Mayu… India   September…         2021 TV-MA 
6 s6      TV Show Midnigh… Mike Fl… Kate… <NA>    September…         2021 TV-MA 
# ℹ 3 more variables: duration <chr>, listed_in <chr>, description <chr>
  1. Create a frequency bar chart for movie ratings in the United States. (Hint: if you’re not familiar with U.S. movie ratings, look them up.)

Use the same data (U.S. movies) for the remaining parts of the question.

# Use filter to subset the data based on the 'type' column to be only movies
movies <- netflix[netflix$type == "Movie", ] 
movies <- netflix[netflix$country == "United States", ] 
#head(movies)

# Check for unique values - if not a rating, then delete the rows that are not part of the MPAA rating #should just be G -> PG -> PG13 -> R -> NC-17
unique(movies$rating) 
 [1] "PG-13"    NA         "TV-MA"    "TV-Y7"    "PG"       "R"       
 [7] "TV-PG"    "TV-14"    "TV-G"     "G"        "TV-Y"     "74 min"  
[13] "84 min"   "66 min"   "NR"       "TV-Y7-FV" "NC-17"    "UR"      
# Create a subset with rows where Ratings is in MPAA rating 
subset_movies_df <- subset(movies,rating %in% c("PG-13", "PG", "R", "G", "NC-17"))
subset_movies_df
# A tibble: 923 × 12
   show_id type  title     director cast  country date_added release_year rating
   <chr>   <chr> <chr>     <chr>    <chr> <chr>   <chr>             <dbl> <chr> 
 1 s1      Movie Dick Joh… Kirsten… <NA>  United… September…         2020 PG-13 
 2 s10     Movie The Star… Theodor… Meli… United… September…         2021 PG-13 
 3 s28     Movie Grown Ups Dennis … Adam… United… September…         2010 PG-13 
 4 s29     Movie Dark Ski… Scott S… Keri… United… September…         2013 PG-13 
 5 s42     Movie Jaws      Steven … Roy … United… September…         1975 PG    
 6 s43     Movie Jaws 2    Jeannot… Roy … United… September…         1978 PG    
 7 s44     Movie Jaws 3    Joe Alv… Denn… United… September…         1983 PG    
 8 s45     Movie Jaws: Th… Joseph … Lorr… United… September…         1987 PG-13 
 9 s49     Movie Training… Antoine… Denz… United… September…         2001 R     
10 s82     Movie Kate      Cedric … Mary… United… September…         2021 R     
# ℹ 913 more rows
# ℹ 3 more variables: duration <chr>, listed_in <chr>, description <chr>
# Create a factor column with specific ordering
subset_movies_df$rating <- factor(subset_movies_df$rating, levels = c("G","PG","PG-13","R","NC-17"))

# Create a bar plot of the counts
ggplot(subset_movies_df, aes(x = rating)) + 
  geom_bar(fill ="lightgreen", color="black") + 
  labs(title = "Movie Ratings in the US", x = "Rating", y = "Frequency") + 
  theme_minimal()

For this analysis, we first subseted the data to only include Movies, and not TV Shows. We did that by subseting data on “type” column. After that, we deleted the rows in “rating” that did not have the rating (G, PG, PG-13, R, and NC-17) that were a part of MPAA rating system. We then used factor() to give specific ordering for the ratings. That was to pre-process the data. Then, using ggplot(), a bar plot for the number of movies that correspond to each rating.

  1. Suppose we want to understand trends in ratings over time. There are multiple ways by which we could convert the numeric variable release_year into a categorical variable for faceting purposes. For this part, we will divide the release year into equal range groups (similar to binwidths), namely decade periods. Use cut() to create a new variable called decade to represent the decades: 1950-1959, 1960-1969, etc. Redraw the graph from part a) faceting on decade. What trends do you observe?

Hint: you can eliminate exponential notation produced by cut() by increasing the value of the dig.lab parameter.

# Create a new variable 'decade' by cutting 'release_year' into decade groups
decades <- c("1950-1959", "1960-1969", "1970-1979", "1980-1989", "1990-1999", "2000-2009", "2010-2019", "2020-2021")

subset_movies_df$decade <- cut(subset_movies_df$release_year, breaks = seq(1950, 2030, by = 10), right = FALSE, labels=decades)

# Create a bar plot faceted by 'decade' to observe trends in ratings over time
ggplot(subset_movies_df, aes(x = rating)) +
  geom_bar(fill="lightgreen") + 
  facet_wrap(~ decade, nrow = 4) +
  labs(title = "Trends in Ratings Over Time by Decade", x = "Rating", y = "Frequency")

For this part, we will divide the release year into equal range groups (similar to binwidths), namely decade periods. Use cut() to create a new variable called decade to represent the decades: 1950-1959, 1960-1969, etc. Redraw the graph from part a) faceting on decade.

For this question, since we wanted the release year to be equal range, we used cut() to group movies based on decade. It was important that it is right-open. Then we created a ggplot() that shows this subset using facet_wrap.

Based on this graph, you can see that the first four decades (1950-1989) barely have any movies. Then for the next two decades, the number of movies increase. Then the decade for 2010-2019 seems to have the most movies. The last decade does not have many movies, but that is expected because this period spans only 1 year. In terms of trends for the ratings (not taking into account 2020-2021), over time there has been an increase in PG-13 movies. That being said, the rating with the most movies, besides 2000-2009 when it was a little less that PG-13, across decades in R and then PG-13.

  1. Another option is to divide the release years into groups of equal size rather than equal range, which is the strategy used by boxplots. Use cut() with quantile() to divide the data into four groups of roughly equal size and again redraw the ratings bar chart from part a) faceted by the new column (call it period). Make sure your labeling is clear so the reader knows what is being shown in each facet and how the years were split. Describe the advantages and disadvantages of this method compared to the method in part b).

Hint: if you end up with an NA for period figure out why and fix it appropriately.

# Determine the quantiles of release_year
quantiles <- quantile(subset_movies_df$release_year)

# Create a new column 'period' by cutting release_year into quantile-based groups
subset_movies_df$period <- cut(subset_movies_df$release_year, breaks = quantiles, labels = c("Q1", "Q2", "Q3", "Q4"), include.lowest = TRUE, right=FALSE)

# Calculate the minimum and maximum release year for each quarter
min_max_by_period <- subset_movies_df %>% 
  group_by(period) %>%
  summarize(min_release_year = min(release_year), max_release_year = max(release_year)) 
print(min_max_by_period)
# A tibble: 4 × 3
  period min_release_year max_release_year
  <fct>             <dbl>            <dbl>
1 Q1                 1955             2002
2 Q2                 2003             2011
3 Q3                 2012             2016
4 Q4                 2017             2021
# Make labeling clear about release year information
subset_movies_df <- subset_movies_df %>% 
  mutate(period = recode(period, "Q1" = "1955-2002", 
                                 "Q2" = "2003-2011", 
                                 "Q3" = "2012-2016", 
                                 "Q4" = "2017-2021"))

# Create a ratings bar chart faceted by 'period'

ggplot(subset_movies_df, aes(x = rating)) +
  geom_bar(fill="lightgreen") + 
  facet_wrap(~ period, nrow = 2) +
  labs(title = "Ratings Distribution Over Equal Size", x = "Rating", y = "Frequency") +  scale_x_discrete(labels = c("PG" = "PG", "PG-13" = "PG-13", "R" = "R", "NC-17" = "NC-17")) + theme(strip.text.x = element_text(size = 11, face = "bold"))

With this question, we first decided the quantiles for the release year and made sure to include RIGHT=false to indicate right open. Then we created a new column called period that subsets the data based on those quantiles. To ensure that labeling makes sense, we recorded the quarters to include the range of years in that quantile. After that, a ggplot() was created based on the new “period” column. It was also important to ensure that the x-axis also had description for the rating.

An advantage to this method is that you are breaking up the data into the same amount of movies for each set of years. This function automatically does this and does not take manual work of you deciding how to split the data. A disadvantages to this method is that the number of years in each grouping is not the same. Therefore, it can be hard to make analysis and interpret this analysis geared towards different time periods.

  1. A tidyverse alternative to the cut() + quantile() method of part c) is to use dplyr::ntile() to divide data into groups of equal size. Redo part c) using ntile(). Again make sure your labeling is clear.
# Use ntile() to create a new column 'period2' with equal-size groups and make labeling clear for period2
subset_movies_df <- subset_movies_df %>% 
  mutate(period2 = dplyr::ntile(release_year, 4))

min_max_by_period2 <- subset_movies_df %>% 
  group_by(period2) %>%
  summarize(min_release_year = min(release_year), max_release_year = max(release_year)) 

print(min_max_by_period2)
# A tibble: 4 × 3
  period2 min_release_year max_release_year
    <int>            <dbl>            <dbl>
1       1             1955             2003
2       2             2003             2012
3       3             2012             2017
4       4             2017             2021
subset_movies_df <- subset_movies_df %>% mutate(period2 = recode(period2, "1" = "1955-2003", "2" = "2003-2012", "3" = "2012-2017", "4" = "2017-2021"))

# Create a ratings bar chart faceted by 'period' with clear labeling
ggplot(subset_movies_df, aes(x = rating)) +
  geom_bar(fill="lightgreen") + facet_wrap(~ period2, nrow = 2) +
  labs(title = "Ratings Distribution using ntile()", x = "Rating", y = "Frequency")

For this question, we subseted the data using ntile() function. Since this function is simply just splitting the data into 4 equal parts, you do not specify if it is right open or right closed. To make the labels make more sense, I recoded the data to include the group of years per time period after finding the min and max year for each subset of data. Then, a ggplot() plot was created to show off this way of dividing release year.

  1. Why aren’t the graphs in part c) and part d) identical? Describe the advantages and disadvantages of the method used by each (that is, cut() vs. ntile().)

The graphs in part c and part d are not identical because each method utilized has a different way of grouping the data into intervals. With the cut() + quantile() function, quantiles uses quantile intervals to divide the data. With ntile(), this divides the data into intervals of equal size. That is why you will see in part c has year interval of 2018-2021, while in part d it has a year interval of 2017-2021.

Using cut() and quantile() function can be useful because the intervals given are based on the distribution of the data. Not all data will be normal, so this ensures that each interval has an equal number of data points. That being said, the trade off is that it will not be the same distribution of time. For example, q1 has about 50 years of data included, while q4 only has 3 years of data included. Ntile() is useful when you need to cut the data into groups with the equal amount of sample size. Unlike cut() and quantile(), they do not take into account the distribution of the data and performs poorly with outliers.

2. SleepStudy

[4 points]

Data: SleepStudy in the Lock5withR package

#install.packages("Lock5withR") 
library(ggplot2) 
library("Lock5withR") 
head(SleepStudy)
  Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
1      0         4 Neither             0          0 3.60             0
2      0         4 Neither             2          1 3.24             0
3      0         4     Owl             0          0 2.97            12
4      0         1    Lark             5          1 3.76             0
5      0         4     Owl             0          0 3.20             4
6      1         4 Neither             0          0 3.50             0
  CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
1           -0.26                4               4            3           8
2            1.39                6               1            0           3
3            0.38               18              18           18           9
4            1.39                9               1            4           6
5            1.22                9               7           25          14
6           -0.04                6              14            8          28
  DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
1           normal        normal normal       15        28   Moderate     10
2           normal        normal normal        4        25   Moderate      6
3         moderate        severe normal       45        17      Light      3
4           normal        normal normal       11        32      Light      2
5           normal        severe normal       46        15   Moderate      4
6         moderate      moderate   high       50        22    Abstain      0
  WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
1      25.75        8.70         7.70      25.75        9.50         5.88
2      25.70        8.20         6.80      26.00       10.00         7.25
3      27.44        6.55         3.00      28.00       12.59        10.09
4      23.50        7.17         6.77      27.00        8.00         7.25
5      25.90        8.67         6.09      23.75        9.50         7.00
6      23.80        8.95         9.05      26.00       10.75         9.00
  AverageSleep AllNighter    Sex allNighter earlyClass
1         7.18          0 Female         No         No
2         6.93          0 Female         No        Yes
3         5.02          0 Female         No         No
4         6.90          0 Female         No        Yes
5         6.35          0 Female         No         No
6         9.04          0   Male         No         No
dim(SleepStudy)
[1] 253  30

For each of the following parts, draw a bar chart or histogram as appropriate to show frequency counts. Hint: check the x-axis carefully for clear, human-readable labels and appropriate tick marks and tick mark labels. In a bar chart, every bar should be labeled.

  1. Gender
# Change 0 and 1's of Gender to be Male and Female
SleepStudy <- SleepStudy %>% 
  mutate(Gender = case_when(
    Gender == 1 ~ "Male",
    Gender == 0 ~ "Female",
    TRUE ~ as.character(Gender)))

# Create a bar chart for gender
ggplot(SleepStudy, aes(x = Gender)) + 
  geom_bar(fill = "lightgreen", color="black") + 
  labs(title = "Gender Distribution", x = "Gender", y = "Frequency")

For this question, it was important to change 0 and 1 to respond to gender, accordingly. When looking up the dataset, it says that 1 is for Male and 0 is for Female. Therefore, I changed the values in Gender column to respond to Male or Female. Next, I created a bar chart to show the frequency of female and males in this dataset.

  1. NumEarlyClass
# Find min and max number of early classes for x-axis
min(SleepStudy$NumEarlyClass) 
[1] 0
max(SleepStudy$NumEarlyClass)
[1] 5
# Plot the distribution
ggplot(SleepStudy, aes(x = NumEarlyClass)) + 
  geom_bar(binwidth = 1, fill = "lightgreen", color = "black") + 
  scale_x_continuous(breaks = seq(0, 5, by = 1), labels = seq(0, 5, by = 1))+
  labs(title = "Number of Early Classes Frequency", x = "Number of Early Classes", y = "Frequency")

For this question, I first calculated the min and max number of early classes, to dictate where the x-axis starts and ends. I then used those values for the limits for the ggplot() and plotted the frequency of number of early classes for the dataset. We choose a bar plot because the x-axis values are discrete.

  1. AlcoholUse
# Change the order of the x-axis
SleepStudy$AlcoholUse <- factor(SleepStudy$AlcoholUse,levels=c("Abstain", "Light", "Moderate", "Heavy"))

# Plot the graph 
ggplot(SleepStudy, aes(x = AlcoholUse)) + 
  geom_bar(fill="lightgreen", color = "black") + 
  labs(title = "Alcohol Use Frequency", x = "Alcohol Use", y = "Frequency")

For this question, I first used factor() to establish the order for the Alcohol Use column in the dataset. After that, I used ggplot() to create a bar plot (x-axis is order that was established for Alcohol Use) that shows frequency for each level of alcohol use.

  1. AverageSleep
# Calculate the min and max values for the x-axis
min(SleepStudy$AverageSleep) 
[1] 4.95
max(SleepStudy$AverageSleep)
[1] 10.62
# Plot the graph
ggplot(SleepStudy, aes(x = AverageSleep)) + 
  geom_histogram(binwidth = 0.5, fill = "lightgreen", color = "black") + 
  labs(title = "Average Sleep Duration Distribution", x = "Average Sleep (Hours)", y = "Frequency") +
  scale_x_continuous(breaks = seq(4.5, 11, by = 0.5), labels = seq(4.5, 11, by = 0.5))

For this question, I first calculated the min and max for average sleep, to dictate the x-axis. I noticed that the data is not whole numbers or even rounded. For that reason, we used a histogram. I then used the min and max values for the limits for the ggplot() and plotted the frequency of average sleep for the dataset. When deciding the sequence for the x-axis, we decided on 0.5 (or 30 minutes) because that gave more granularity than hour but not too much as with 0.25 (15 minutes) intervals. We also said that usually sleep is measured in half hours (I slept for 6 and a half hours, etc.).

3. Nutritional Facts for most common foods

# Packages needeed for part 3 and part 4
library(ggplot2)
library(dplyr)
library(tidyverse)
library(readr)
library(MASS)

[12 points]

Data: nutrients.csv

The original source of the data is: https://en.wikipedia.org/wiki/Table_of_food_nutrients though there may be discrepancies between values on this page and the dataset.

Note In the Measure column, “t” = teaspoon and “T” = tablespoon. In the food nutrient columns, the letter “t” indicates that only a trace amount is available (which you can assume is 0).

  1. Create a new calorie_density column defined as calories per gram. Create a Cleveland dot plot for the 10% of foods with the highest calorie density. Note any data abnormalities in the plot.
nutrients <- read.csv("nutrients.csv")
head(nutrients)
                   Food Measure Grams Calories Protein Fat Sat.Fat Fiber Carbs
1            Cows' milk   1 qt.   976      660      32  40      36     0    48
2             Milk skim   1 qt.   984      360      36   t       t     0    52
3            Buttermilk   1 cup   246      127       9   5       4     0    13
4 Evaporated, undiluted   1 cup   252      345      16  20      18     0    24
5        Fortified milk  6 cups 1,419    1,373      89  42      23   1.4   119
6         Powdered milk   1 cup   103      515      27  28      24     0    39
        Category
1 Dairy products
2 Dairy products
3 Dairy products
4 Dairy products
5 Dairy products
6 Dairy products
# Drop missing values in the columns Calories and Grams
nutrients %>% drop_na(c('Calories', 'Grams'))
                                        Food       Measure Grams Calories
1                                 Cows' milk         1 qt.   976      660
2                                  Milk skim         1 qt.   984      360
3                                 Buttermilk         1 cup   246      127
4                      Evaporated, undiluted         1 cup   252      345
5                             Fortified milk        6 cups 1,419    1,373
6                              Powdered milk         1 cup   103      515
7                              skim, instant    1 1/3 cups    85      290
8                          skim, non-instant       2/3 cup    85      290
9                                Goats' milk         1 cup   244      165
10                       (1/2 cup ice cream)        2 cups   540      690
11                                     Cocoa         1 cup   252      235
12                                skim. milk         1 cup   250      128
13                              (cornstarch)         1 cup   248      275
14                                   Custard         1 cup   248      285
15                                 Ice cream         1 cup   188      300
16                                  Ice milk         1 cup   190      275
17                    Cream or half-and-half       1/2 cup   120      170
18                               or whipping       1/2 cup   119      430
19                                    Cheese         1 cup   225      240
20                                 uncreamed         1 cup   225      195
21                                   Cheddar    1-in. cube    17       70
22                       Cheddar, grated cup       1/2 cup    56      226
23                              Cream cheese         1 oz.    28      105
24                          Processed cheese         1 oz.    28      105
25                            Roquefort type         1 oz.    28      105
26                                     Swiss         1 oz.    28      105
27                                  Eggs raw             2   100      150
28                   Eggs Scrambled or fried             2   128      220
29                                     Yolks             2    34      120
30                                    Butter           1T.    14      100
31                                    Butter       1/2 cup   112      113
32                                    Butter       1/4 lb.   112      113
33                  Hydrogenated cooking fat       1/2 cup   100      665
34                                      Lard       1/2 cup   110      992
35                                 Margarine       1/2 cup   112      806
36                       Margarine, 2 pat or          1 T.    14      100
37                                Mayonnaise          1 T.    15      110
38                                  Corn oil          1 T.    14      125
39                                 Olive oil           1T.    14      125
40                        Safflower seed oil          1 T.    14      125
41                           French dressing          1 T.    15       60
42                     Thousand Island sauce          1 T.    15       75
43                                 Salt pork         2 oz.    60      470
44                                     Bacon      2 slices    16       95
45                                      Beef         3 oz.    85      245
46                                 Hamburger         3 oz.    85      245
47                               Ground lean         3 oz.    85      185
48                                Roast beef         3 oz.    85      390
49                                     Steak         3 oz.    85      330
50                     Steak, lean, as round         3 oz.    85      220
51                               Corned beef         3 oz.    85      185
52                   Corned beef hash canned         3 oz.    85      120
53                    Corned beef hash Dried         2 oz.    56      115
54                                   Pot-pie         1 pie   227      480
55                     Corned beef hash Stew         1 cup   235      185
56                                   chicken         3 oz.    85      185
57    Fried, breast or leg and thigh chicken         3 oz.    85      245
58                           Roasted chicken     3 1/2 oz.   100      290
59                     Chicken livers, fried        3 med.   100      140
60                            Duck, domestic     3 1/2 oz.   100      370
61                       Lamb, chop, broiled         4 oz.   115      480
62                               Leg roasted         3 oz.    86      314
63                         Shoulder, braised         3 oz.    85      285
64                       Pork, chop, 1 thick     3 1/2 oz.   100      260
65                           Ham pan-broiled         3 oz.    85      290
66                                  Ham, as          2 oz.    57      170
67                       Ham, canned, spiced         2 oz.    57      165
68                                Pork roast         3 oz.    85      310
69                              Pork sausage     3 1/2 oz.   100      475
70                                    Turkey     3 1/2 oz.   100      265
71                                      Veal         3 oz.    85      185
72                                     Roast         3 oz.    85      305
73                                     Clams         3 oz.    85       87
74                                       Cod     3 1/2 oz.   100      170
75                                 Crab meat         3 oz.    85       90
76                         Fish sticks fried             5   112      200
77                                  Flounder     3 1/2 oz.   100      200
78                                   Haddock         3 oz.    85      135
79                                   Halibut     3 1/2 oz.   100      182
80                                   Herring       1 small   100      211
81                                   Lobster         aver.   100       92
82                                  Mackerel         3 oz.    85      155
83                                   Oysters      6-8 med.   230      231
84                               Oyster stew         1 cup    85      125
85                                    Salmon         3 oz.    85      120
86                                  Sardines         3 oz.    85      180
87                                  Scallops     3 1/2 oz.   100      104
88                                      Shad         3 oz.    85      170
89                                    Shrimp         3 oz.    85      110
90                                 Swordfish       1 steak   100      180
91                                      Tuna         3 oz.    85      170
92                                 Artichoke       1 large   100     8-44
93                                 Asparagus      6 spears    96       18
94                                     Beans         1 cup   125       25
95                                      Lima         1 cup   160      140
96                         Lima, dry, cooked         1 cup   192      260
97                     Navy, baked with pork       3/4 cup   200      250
98                                Red kidney         1 cup   260      230
99                              Bean sprouts         1 cup    50       17
100                              Beet greens         1 cup   100       27
101                                Beetroots         1 cup   165        1
102                                 Broccoli         1 cup   150       45
103                         Brussels sprouts         1 cup   130       60
104                               Sauerkraut         1 cup   150       32
105                          Steamed cabbage         1 cup   170       40
106                                  Carrots         1 cup   150       45
107                              Raw, grated         1 cup   110       45
108                         Strips, from raw        1 mad.    50       20
109                              Cauliflower         1 cup   120       30
110                                   Celery         1 cup   100       20
111                                Stalk raw       1 large    40        5
112                            Chard steamed         1 cup   150       30
113                                 Collards         1 cup   150       51
114                                     Corn         1 ear   100       92
115                         cooked or canned         1 cup   200      170
116                                Cucumbers             8    50        6
117                         Dandelion greens         1 cup   180       80
118                                 Eggplant         1 cup   180       30
119                                   Endive         2 oz.    57       10
120                                     Kale         1 cup   110       45
121                                 Kohlrabi         1 cup   140       40
122                  Lambs quarters, steamed         1 cup   150       48
123                                  Lentils         1 cup   200      212
124                                  Lettuce      1/4 head   100       14
125                                  Iceberg      1/4 head   100       13
126                         Mushrooms canned             4   120       12
127                           Mustard greens             1   140       30
128                                     Okra    1 1/3 cups   100       32
129                                   Onions             1   210       80
130                               Raw, green       6 small    50       22
131                                  Parsley          2 T.    50        2
132                                 Parsnips         1 cup   155       95
133                                     Peas         1 cup   100       66
134                      Fresh, steamed peas         1 cup   100       70
135                              Frozen peas         1 cup   100         
136                        Split cooked peas        4 cups   100      115
137                              heated peas         1 cup   100       53
138                           Peppers canned         1 pod    38       10
139                Peppers Raw, green, sweet       1 large   100       25
140             Peppers with beef and crumbs        1 med.   150      255
141                          Potatoes, baked        1 med.   100      100
142                             French-fried     10 pieces    60      155
143     Potatoes Mashed with milk and butter         1 cup   200      230
144                      Potatoes, pan-tried       3/4 cup   100      268
145           Scalloped with cheese potatoes       3/4 cup   100      145
146          Steamed potatoes before peeling        1 med.   100       80
147                             Potato chips            10    20      110
148                                 Radishes       5 small    50       10
149                                Rutabagas        4 cups   100       32
150                                 Soybeans         1 cup   200      260
151                                  Spinach         1 cup   100       26
152                                   Squash         1 cup   210       35
153                           Winter, mashed         1 cup   200       95
154                           Sweet potatoes        1 med.   110      155
155                                  Candied        1 med.   175      235
156                                 Tomatoes         1 cup   240       50
157                          Raw, 2 by 2 1/2        1 med.   150       30
158                             Tomato juice         1 cup   240       50
159                            Tomato catsup          1 T.    17       15
160                            Turnip greens         1 cup   145       45
161                         Turnips, steamed         1 cup   155       40
162                    Watercress stems, raw         1 cup    50        9
163                       Apple juice canned         1 cup   250      125
164                            Apple vinegar       1/3 cup   100       14
165                              Apples, raw         1 med   130       70
166                         Stewed or canned         1 cup   240      100
167                                 Apricots         1 cup   250      220
168                          Dried, uncooked       1/2 cup    75      220
169                                    Fresh        3 med.   114       55
170                         Nectar, or juice         1 cup   250      140
171                                  Avocado     1/2 large   108      185
172                                   Banana        1 med.   150       85
173                             Blackberries         1 cup   144       85
174                              Blueberries         1 cup   250      245
175                               Cantaloupe      1/2 med.   380       40
176                                 Cherries         1 cup   257      100
177                               Fresh, raw         1 cup   114       65
178                Cranberry sauce sweetened         1 cup   277      530
179                                    Dates         1 cup   178      505
180                                     Figs             2    42      120
181                          Fresh, raw figs        3 med.   114       90
182                  figs Canned with syrup              3   115      130
183                   Fruit cocktail, canned         1 cup   256      195
184                      Grapefruit sections         1 cup   250      170
185           Grapefruit, fresh, 5" diameter           1/2   285       50
186                         Grapefruit juice         1 cup   250      100
187                                   Grapes         1 cup   153       70
188               European, as Muscat, Tokay         1 cup   160      100
189                              Grape juice         1 cup   250      160
190                              Lemon juice       1/2 cup   125       30
191               Lemonade concentratefrozen     6-oz. can   220      430
192               Limeade concentrate frozen     6-oz. can   218      405
193                             Olives large            10    65       72
194                               OlivesRipe            10    65      105
195                      Oranges 3" diameter        1 med.   180       60
196                             Orange juice      8 oz. or   250      112
197                                  Frozen      6-oz. can   210      330
198                                   Papaya      1/2 med.   200       75
199                                  Peaches         1 cup   257      200
200                               Fresh, raw        1 med.   114       35
201                                    Pears         1 cup   255      195
202                             Raw, 3 by 2V        1 med.   182      100
203                               Persimmons        1 med.   125       75
204                                Pineapple 1 large slice   122       95
205                        Pineapple Crushed         1 cup   260      205
206                               Raw, diced         1 cup   140       75
207                          Pineapple juice         1 cup   250      120
208                                    Plums         1 cup   256      185
209                         Raw, 2" diameter             1    60       30
210                                   Prunes         1 cup   270      300
211                              Prune juice         1 cup   240      170
212                                  Raisins       1/2 cup    88      230
213                              Raspberries       1/2 cup   100      100
214                                 Raw, red       3/4 cup   100       57
215                        Rhubarb sweetened         1 cup   270      385
216                             Strawberries         1 cup   227      242
217                                      Raw         1 cup   149       54
218                               Tangerines        I med.   114       40
219                               Watermelon       1 wedge   925      120
220                                 Biscuits             1    38      130
221                              Bran flakes         1 cup    25      117
222                     Bread, cracked wheat       1 slice    23       60
223                                      Rye       1 slice    23       55
224                     White, 20 slices, or    1-lb. loaf   454    1,225
225                              Whole-wheat    1-lb. loaf   454    1,100
226                              Whole-wheat       1 slice    23       55
227                   Corn bread ground meal     1 serving    50      100
228                               Cornflakes         1 cup    25      110
229                        Corn grits cooked         1 cup   242      120
230                                Corn meal         1 cup   118      360
231                                 Crackers        2 med.    14       55
232                       Soda, 2 1/2 square             2    11       45
233                                   Farina         1 cup   238      105
234                                    Flour         1 cup   110      460
235                      Wheat (all purpose)         1 cup   110      400
236                            Wheat (whole)         1 cup   120      390
237                                 Macaroni         1 cup   140      155
238                        Baked with cheese         1 cup   220      475
239                                  Muffins             1    48      135
240                                  Noodles         1 cup   160      200
241                                  Oatmeal         1 cup   236      150
242                        Pancakes 4" diam.             4   108      250
243                 Wheat, pancakes 4" diam.             4   108      250
244                          Pizza 14" diam.     1 section    75      180
245                           Popcorn salted        2 cups    28      152
246                              Puffed rice         1 cup    14       55
247                Puffed wheat presweetened         1 cup    28      105
248                                     Rice         1 cup   208      748
249                                Converted         1 cup   187      677
250                                    White         1 cup   191      692
251                              Rice flakes         1 cup    30      115
252                              Rice polish       1/2 cup    50      132
253                                    Rolls       1 large    50      411
254                         of refined flour             1    38      115
255                              whole-wheat             1    40      102
256                Spaghetti with meat sauce         1 cup   250      285
257                 with tomatoes and cheese         1 cup   250      210
258                             Spanish rice         1 cup   250      217
259                   Shredded wheat biscuit             1    28      100
260                                  Waffles             1    75      240
261                               Wheat germ         1 cup    68      245
262                Wheat-germ cereal toasted         1 cup    65      260
263              Wheat meal cereal unrefined       3/4 cup    30      103
264                            Wheat, cooked       3/4 cup   200      275
265                               Bean soups         1 cup   250      190
266                                Beef soup         1 cup   250      100
267                                 Bouillon         1 cup   240       24
268                             chicken soup         1 cup   250       75
269                             Clam chowder         1 cup   255       85
270                              Cream soups         1 cup   255      200
271                                   Noodle         1 cup   250      115
272                           Split-pea soup         1 cup   250      147
273                              Tomato soup         1 cup   245      175
274                                Vegetable         1 cup   250       80
275                              Apple betty     1 serving   100      150
276                            Bread pudding       3/4 cup   200      374
277                                    Cakes       1 slice    40      110
278                          Chocolate fudge       1 slice   120      420
279                                  Cupcake             1    50      160
280                               Fruit cake       1 slice    30      105
281                              Gingerbread       1 slice    55      180
282                     Plain, with no icing       1 slice    55      180
283                              Sponge cake       1 slice    40      115
284                                    Candy             5    25      104
285                         Chocolate creams             2    30      130
286                                    Fudge      2 pieces    90      370
287                             Hard candies         1 oz.    28       90
288                             Marshmallows             5    30       98
289                           Milk chocolate     2-oz. bar    56      290
290                          Chocolate syrup          2 T.    40       80
291                                Doughnuts             1    33      135
292                 Gelatin, made with water         1 cup   239      155
293                                    Honey          2 T.    42      120
294                                Ice cream        2 cups   300      250
295                                     Ices         1 cup   150      117
296                                preserves          1 T.    20       55
297                                  Jellies          1 T.    20       50
298                                 Molasses          1 T.    20       45
299                               Cane Syrup          1 T.    20       50
300                             9" diam. pie       1 slice   135      330
301                               Cherry Pie       1 slice   135      340
302                                  Custard       1 slice   130      265
303                           Lemon meringue       1 slice   120      300
304                                    Mince       1 slice   135      340
305                              Pumpkin Pie       1 slice   130      265
306                           Puddings Sugar         1 cup   200      770
307                        3 teaspoons sugar          1 T.    12       50
308           Brown, firm-packed, dark sugar         1 cup   220      815
309                                    Syrup          2 T.    40      100
310                       table blends sugar          2 T.    40      110
311                    Tapioca cream pudding         1 cup   250      335
312                                  Almonds       1/2 cup    70      425
313                       roasted and salted       1/2 cup    70      439
314                              Brazil nuts       1/2 cup    70      457
315                                  Cashews       1/2 cup    70      392
316                        coconut sweetened       1/2 cup    50      274
317                            Peanut butter       1/3 cup    50      300
318                   Peanut butter, natural       1/3 cup    50      284
319                                  Peanuts       1/3 cup    50      290
320                                   Pecans       1/2 cup    52      343
321                             Sesame seeds       1/2 cup    50      280
322                          Sunflower seeds       1/2 cup    50      280
323                                  Walnuts       1/2 cup    50      325
324                                     Beer        2 cups   480      228
325                                      Gin         1 oz.    28       70
326                                    Wines       1/2 cup   120      164
327                    Table (12.2% alcohol)       1/2 cup   120      100
328 Carbonated drinks Artificially sweetened        12 oz.   346        0
329                                Club soda        12 oz.   346        0
330                              Cola drinks        12 oz.   346      137
331                      Fruit-flavored soda        12 oz.   346      161
332                               Ginger ale        12 oz.   346      105
333                                Root beer        12 oz.   346      140
334                                   Coffee         1 cup   230        3
335                                      Tea         1 cup   230        4
    Protein Fat Sat.Fat Fiber Carbs                         Category
1        32  40      36     0    48                   Dairy products
2        36   t       t     0    52                   Dairy products
3         9   5       4     0    13                   Dairy products
4        16  20      18     0    24                   Dairy products
5        89  42      23   1.4   119                   Dairy products
6        27  28      24     0    39                   Dairy products
7        30   t       t     0    42                   Dairy products
8        30   t       t     1    42                   Dairy products
9         8  10       8     0    11                   Dairy products
10       24  24      22     0    70                   Dairy products
11        8  11      10     0    26                   Dairy products
12       18   4       3     1    13                   Dairy products
13        9  10       9     0    40                   Dairy products
14       13  14      11     0    28                   Dairy products
15        6  18      16     0    29                   Dairy products
16        9  10       9     0    32                   Dairy products
17        4  15      13     0     5                   Dairy products
18        2  44      27     1     3                   Dairy products
19       30  11      10     0     6                   Dairy products
20       38   t       t     0     6                   Dairy products
21        4   6       5     0     t                   Dairy products
22       14  19      17     0     1                   Dairy products
23        2  11      10     0     1                   Dairy products
24        7   9       8     0     t                   Dairy products
25        6   9       8     0     t                   Dairy products
26        7   8       7     0     t                   Dairy products
27       12  12      10     0     t                   Dairy products
28       13  16      14     0     1                   Dairy products
29        6  10       8     0     t          Fats, Oils, Shortenings
30        t  11      10     0     t          Fats, Oils, Shortenings
31      114 115     116   117   118          Fats, Oils, Shortenings
32      114 115     116   117   118          Fats, Oils, Shortenings
33        0 100      88     0     0          Fats, Oils, Shortenings
34        0 110      92     0     0          Fats, Oils, Shortenings
35        t  91      76     0     t          Fats, Oils, Shortenings
36        t  11       9     0     t          Fats, Oils, Shortenings
37        t  12       5     0     t          Fats, Oils, Shortenings
38        0  14       5     0     0          Fats, Oils, Shortenings
39        0  14       3     0     0          Fats, Oils, Shortenings
40        0  14       3     0     0          Fats, Oils, Shortenings
41        t   6       2     0     2          Fats, Oils, Shortenings
42        t   8       3     0     1          Fats, Oils, Shortenings
43        3  55             0     0                    Meat, Poultry
44        4   8       7     0     1                    Meat, Poultry
45       23  16      15     0     0                    Meat, Poultry
46       21  17      15     0     0                    Meat, Poultry
47       24  10       9     0     0                    Meat, Poultry
48       16  36      35     0     0                    Meat, Poultry
49       20  27      25     0     0                    Meat, Poultry
50       24  12      11     0     0                    Meat, Poultry
51       22  10       9     0     0                    Meat, Poultry
52       12   8       7     t     6                    Meat, Poultry
53       19   4       4     0     0                    Meat, Poultry
54       18  28      25     t    32                    Meat, Poultry
55       15  10       9     t    15                    Meat, Poultry
56       23   9       7     0     0                    Meat, Poultry
57       25  15      11     0     0                    Meat, Poultry
58       25  20      16     0     0                    Meat, Poultry
59       22  14      12     0  2.30                    Meat, Poultry
60       16  28       0     0     0                    Meat, Poultry
61       24  35      33     0     0                    Meat, Poultry
62       20  14      14     0     0                    Meat, Poultry
63       18  23      21     0     0                    Meat, Poultry
64       16  21      18     0     0                    Meat, Poultry
65       16  22      19     0     0                    Meat, Poultry
66       13  13      11     0     0                    Meat, Poultry
67        8  14      12     0     1                    Meat, Poultry
68       21  24      21     0     0                    Meat, Poultry
69       18  44      40     0     0                    Meat, Poultry
70       27  15       0     0     0                    Meat, Poultry
71       23   9       8     0     0                    Meat, Poultry
72       13  14      13     0     0                    Meat, Poultry
73       12   1       0     0     2                    Fish, Seafood
74       28   5       0     0     0                    Fish, Seafood
75       14   2       0     0     1                    Fish, Seafood
76       19  10       5     0     8                    Fish, Seafood
77       30   8       0     0     0                    Fish, Seafood
78       16   5       4     0     6                    Fish, Seafood
79       26   8       0     0     0                    Fish, Seafood
80       22  13       0     0     0                    Fish, Seafood
81       18   1       0     0     t                    Fish, Seafood
82       18   9       0     a     0                    Fish, Seafood
83      232 233     234   235   236                    Fish, Seafood
84       19   6       1     0     0                    Fish, Seafood
85       17   5       1     0     0                    Fish, Seafood
86       22   9       4     0     0                    Fish, Seafood
87       18   8       0     0    10                    Fish, Seafood
88       20  10       0     0     0                    Fish, Seafood
89       23   1       0     0     0                    Fish, Seafood
90       27   6       0     0     0                    Fish, Seafood
91       25   7       3     0     0                    Fish, Seafood
92        2   t       t     2    10                   Vegetables A-E
93        1   t       t   0.5     3                   Vegetables A-E
94        1   t       t   0.8     6                   Vegetables A-E
95        8   t       t   3.0    24                   Vegetables A-E
96       16   t       t     2    48                   Vegetables A-E
97       11   6       6     2    37                   Vegetables A-E
98       15   1       0   2.5    42                   Vegetables A-E
99        1   t       0   0.3     3                   Vegetables A-E
100       2   t       0   1.4     6                   Vegetables A-E
101      12   0             t  0.80                   Vegetables A-E
102       5   t       0   1.9     8                   Vegetables A-E
103       6   t       0   1.7    12                   Vegetables A-E
104       1   t       0   1.2     7                   Vegetables A-E
105       2   t       0   1.3     9                   Vegetables A-E
106       1   t       0   0.9    10                   Vegetables A-E
107       1   t       0   1.2    10                   Vegetables A-E
108       t   t       0   0.5     5                   Vegetables A-E
109       3   t       0     1     6                   Vegetables A-E
110       1   t       0     1     4                   Vegetables A-E
111       1   t       0   0.3     1                   Vegetables A-E
112       2   t       0   1.4     7                   Vegetables A-E
113       5   t       0     2     8                   Vegetables A-E
114       3   1       t   0.8    21                   Vegetables A-E
115       5   t       0   1.6    41                   Vegetables A-E
116       t   0       0   0.2     1                   Vegetables A-E
117       5   1       0   3.2    16                   Vegetables A-E
118       2   t       0   1.0     9                   Vegetables A-E
119       1   t       0   0.6     2                   Vegetables A-E
120       4   1       0   0.9     8                   Vegetables F-P
121       2   t       0   1.5     9                   Vegetables F-P
122       5   t       0   3.2     7                   Vegetables F-P
123      15   t       0   2.4    38                   Vegetables F-P
124       1   t       0   0.5     2                   Vegetables F-P
125       t   t       0   0.5     3                   Vegetables F-P
126       2   t       0     t     4                   Vegetables F-P
127       3   t       0   1.2     6                   Vegetables F-P
128       1   t       0     1     7                   Vegetables F-P
129       2   t       0   1.6    18                   Vegetables F-P
130       t   t       0     1     5                   Vegetables F-P
131       t   t       0     t     t                   Vegetables F-P
132       2   1       0     3    22                   Vegetables F-P
133       3   t       0   0.1    13                   Vegetables F-P
134       5   t       0   2.2    12                   Vegetables R-Z
135       5   t       0   1.8    12                   Vegetables R-Z
136       8   t       0   0.4    21                   Vegetables R-Z
137       3   t       0     1    10                   Vegetables R-Z
138       t   t       0     t     2                   Vegetables R-Z
139       1   t       0   1.4     6                   Vegetables R-Z
140      19   9       8     1    24                   Vegetables R-Z
141       2   t       0   0.5    22                   Vegetables R-Z
142      -1   7       3   0.4    20                   Vegetables R-Z
143       4  12      11   0.7    28                   Vegetables R-Z
144       4  14       6  0.40    33                   Vegetables R-Z
145       6   8       7  0.40    14                   Vegetables R-Z
146       2   t       0  0.40    19                   Vegetables R-Z
147       1   7       4     t    10                   Vegetables R-Z
148       t   0       0   0.3     2                   Vegetables R-Z
149       t   0       0   1.4     8                   Vegetables R-Z
150      22  11       0   3.2    20                   Vegetables R-Z
151       3   t       0     1     3                   Vegetables R-Z
152       1   t       0   0.6     8                   Vegetables R-Z
153       4   t       0   2.6    23                   Vegetables R-Z
154       2   1       0     1    36                   Vegetables R-Z
155       2   6       5   1.5    80                   Vegetables R-Z
156       2   t       0     1     9                   Vegetables R-Z
157       1   t       0   0.6     6                   Vegetables R-Z
158       2   t       0   0.6    10                   Vegetables R-Z
159       t   t       0     t     4                   Vegetables R-Z
160       4   1       0   1.8     8                   Vegetables R-Z
161       1   t       0   1.8     9                   Vegetables R-Z
162       1   t       0   0.3     1                       Fruits A-F
163       t   0       0     0    34                       Fruits A-F
164       t   0       0     0     3                       Fruits A-F
165       t   t       0     1    18                       Fruits A-F
166       t   t       0     2    26                       Fruits A-F
167       2   t       0     1    57                       Fruits A-F
168       4   t       0     1    50                       Fruits A-F
169       1   t       0  0.70    14                       Fruits A-F
170       1   t       0     2    36                       Fruits A-F
171       2  18      12  1.80     6                       Fruits A-F
172       1   t       0   0.9    23                       Fruits A-F
173       2   1       0  6.60    19                       Fruits A-F
174       1   t       0     2    65                       Fruits A-F
175       1   t       0  2.20     9                       Fruits A-F
176       2   1       0     2    26                       Fruits A-F
177       1   t       0   0.8    15                       Fruits A-F
178       t   t       0   1.2   142                       Fruits A-F
179       4   t       0   3.6   134                       Fruits A-F
180       2   t       0   1.9    30                       Fruits A-F
181       2   t       0     1    22                       Fruits A-F
182       1   t       0     1    32                       Fruits A-F
183       1   t       0   0.5    50                       Fruits A-F
184       1   t       0   0.5    44                       Fruits G-P
185       1   t       t     1    14                       Fruits G-P
186       1   t       0     1    24                       Fruits G-P
187       1   t       0   0.8    16                       Fruits G-P
188       1   t       0   0.7    26                       Fruits G-P
189       1   t       0     t    42                       Fruits G-P
190       t   t       0     t    10                       Fruits G-P
191       t   t       0     t   112                       Fruits G-P
192       t   t       0     t   108                       Fruits G-P
193       1  10       9   0.8     3                       Fruits G-P
194       1  13      12     1     1                       Fruits G-P
195       2   t       t     1    16                       Fruits G-P
196       2   t       0   0.2    25                       Fruits G-P
197       2   t       t   0.4    78                       Fruits G-P
198       1   t       0   1.8    18                       Fruits G-P
199       1   t       0     1    52                       Fruits G-P
200       1   t       0   0.6    10                       Fruits G-P
201       1   t       0     2    50                       Fruits G-P
202       1   1       0     2    25                       Fruits G-P
203       1   t       0     2    20                       Fruits G-P
204       t   t       0   0.4    26                       Fruits G-P
205       1   t       0   0.7    55                       Fruits G-P
206       1  t'       0   0.6    19                       Fruits G-P
207       1   t       0   0.2    32                       Fruits G-P
208       1   t       0   0.7    50                       Fruits G-P
209       t   t       0   0.2     7                       Fruits G-P
210       3   1       0   0.8    81                       Fruits G-P
211       1   t       0   0.7    45                       Fruits G-P
212       2   t       0   0.7    82                       Fruits R-Z
213       t   t       0     2    25                       Fruits R-Z
214       t   t       0     5    14                       Fruits R-Z
215       1   t       0   1.9    98                       Fruits R-Z
216       1   t       0   1.3    60                       Fruits R-Z
217       t   t       0   1.9    12                       Fruits R-Z
218       1   t       0     1    10                       Fruits R-Z
219       2   1       0   3.6    29                       Fruits R-Z
220       3   4       3     t    18 Breads, cereals, fastfood,grains
221       3   t       0  0.10    32 Breads, cereals, fastfood,grains
222       2   1       1  0.10    12 Breads, cereals, fastfood,grains
223       2   1       1  0.10    12 Breads, cereals, fastfood,grains
224      39  15      12  9.00   229 Breads, cereals, fastfood,grains
225      48  14      10 67.50   216 Breads, cereals, fastfood,grains
226       2   1       0  0.31    11 Breads, cereals, fastfood,grains
227       3   4       2  0.30    15 Breads, cereals, fastfood,grains
228       2   t       0   0.1    25 Breads, cereals, fastfood,grains
229       8   t       0   0.2    27 Breads, cereals, fastfood,grains
230       9   4       2   1.6    74 Breads, cereals, fastfood,grains
231       1   1       0     t    10 Breads, cereals, fastfood,grains
232       1   1       0     t     8 Breads, cereals, fastfood,grains
233       3   t       0     8    22 Breads, cereals, fastfood,grains
234      39  22       0   2.9    33 Breads, cereals, fastfood,grains
235      12   1       0   0.3    84 Breads, cereals, fastfood,grains
236      13   2       0   2.8    79 Breads, cereals, fastfood,grains
237       5   1       0   0.1    32 Breads, cereals, fastfood,grains
238      18  25      24     t    44 Breads, cereals, fastfood,grains
239       4   5       4     t    19 Breads, cereals, fastfood,grains
240       7   2       2   0.1    37 Breads, cereals, fastfood,grains
241       5   3       2   4.6    26 Breads, cereals, fastfood,grains
242       7   9       0   0.1    28 Breads, cereals, fastfood,grains
243       7   9       0   0.1    28 Breads, cereals, fastfood,grains
244       8   6       5     t    23 Breads, cereals, fastfood,grains
245       3   7       2   0.5    20 Breads, cereals, fastfood,grains
246       t   t       0     t    12 Breads, cereals, fastfood,grains
247       1   t       0   0.6    26 Breads, cereals, fastfood,grains
248      15   3       0   1.2   154 Breads, cereals, fastfood,grains
249      14   t       0   0.4   142 Breads, cereals, fastfood,grains
250      14   t       0   0.3   150 Breads, cereals, fastfood,grains
251       2   t       0   0.1    26 Breads, cereals, fastfood,grains
252       6   6       0   1.2    28 Breads, cereals, fastfood,grains
253       3  12      11   0.1    23 Breads, cereals, fastfood,grains
254       3   2       2     t    20 Breads, cereals, fastfood,grains
255       4   1       0   0.1    20 Breads, cereals, fastfood,grains
256      13  10       6  0.50    35 Breads, cereals, fastfood,grains
257       6   5       3  0.50    36 Breads, cereals, fastfood,grains
258       4   4       0  1.20    40 Breads, cereals, fastfood,grains
259       3   1       0  0.70    23 Breads, cereals, fastfood,grains
260       8   9       1  0.10    30 Breads, cereals, fastfood,grains
261      17   7       3  2.50    34 Breads, cereals, fastfood,grains
262      20   7       3  2.50    36 Breads, cereals, fastfood,grains
263       4   1       0  0.70    25 Breads, cereals, fastfood,grains
264      12   1       0  4.40    35 Breads, cereals, fastfood,grains
265       8   5       4  0.60    30                            Soups
266       6   4       4  0.50    11                            Soups
267       5   0       0     0     0                            Soups
268       4   2       2     0    10                            Soups
269       5   2       8  0.50    12                            Soups
270       7  12      11  1.20    18                            Soups
271       6   4       3  0.20    13                            Soups
272       8   3       3  0.50    25                            Soups
273       6   7       6  0.50    22                            Soups
274       4   2       2     0    14                            Soups
275       1   4       0   0.5    29                 Desserts, sweets
276      11  12      11  0.20    56                 Desserts, sweets
277       3   t       0     0    23                 Desserts, sweets
278       5  14      12   0.3    70                 Desserts, sweets
279       3   3       2     t    31                 Desserts, sweets
280       2   4       3   0.2    17                 Desserts, sweets
281       2   7       6     t    28                 Desserts, sweets
282       4   5       4     t    31                 Desserts, sweets
283       3   2       2     0    22                 Desserts, sweets
284       t   3       3     0    19                 Desserts, sweets
285       t   4       4     0    24                 Desserts, sweets
286       t  12      11   0.1    80                 Desserts, sweets
287       t   0       0     0    28                 Desserts, sweets
288       1   0       0     0    23                 Desserts, sweets
289       2   6       6   0.2    44                 Desserts, sweets
290       t   t       t     0    22                 Desserts, sweets
291       2   7       4     t    17                 Desserts, sweets
292       4   t       t     0    36                 Desserts, sweets
293       t   0       0     0    30                    Jams, Jellies
294       0   0      12    10     0                 Desserts, sweets
295       0   0       0     0    48                 Desserts, sweets
296       0   0       0     t    14                    Jams, Jellies
297       0   0       0     0    13                    Jams, Jellies
298       0   0       0     8    11                    Jams, Jellies
299       0   0       0     0    13                    Jams, Jellies
300       3  13      11   0.1    53                 Desserts, sweets
301       3  13      11   0.1    55                 Desserts, sweets
302       7  11      10     0    34                 Desserts, sweets
303       4  12      10   0.1    45                 Desserts, sweets
304       3   9       8  0.70    62                 Desserts, sweets
305       5  12      11     8    34                 Desserts, sweets
306       0   0       0     0   199                 Desserts, sweets
307       0   0       0     0    12                 Desserts, sweets
308       0   t       0     0   210                    Jams, Jellies
309       0   0       0     0    25                    Jams, Jellies
310       0   0       0     0    29                    Jams, Jellies
311      10  10       9     0    42                 Desserts, sweets
312      13  38      28   1.8    13                   Seeds and Nuts
313      13  40      31   1.8    13                   Seeds and Nuts
314      10  47      31     2     7                   Seeds and Nuts
315      12  32      28   0.9    20                   Seeds and Nuts
316       1  20      19     2    26                   Seeds and Nuts
317      12  25      17   0.9     9                   Seeds and Nuts
318      13  24      10   0.9     8                   Seeds and Nuts
319      13  25      16   1.2     9                   Seeds and Nuts
320       5  35      25   1.1     7                   Seeds and Nuts
321       9  24      13   3.1    10                   Seeds and Nuts
322      12  26       7   1.9    10                   Seeds and Nuts
323       7  32       7     1     8                   Seeds and Nuts
324       t   0       0     0     8        Drinks,Alcohol, Beverages
325       0   0       0     0     t        Drinks,Alcohol, Beverages
326       t   0       0     0     9        Drinks,Alcohol, Beverages
327       t   0       0     0     5        Drinks,Alcohol, Beverages
328       0   0       0     0     0        Drinks,Alcohol, Beverages
329       0   0       0     0     0        Drinks,Alcohol, Beverages
330       0   0       0     0    38        Drinks,Alcohol, Beverages
331       0   0       0     0    42        Drinks,Alcohol, Beverages
332       0   0       0     0    28        Drinks,Alcohol, Beverages
333       0   0       0     0    35        Drinks,Alcohol, Beverages
334       t   0       0     0     1        Drinks,Alcohol, Beverages
335       0   t       0     0     1        Drinks,Alcohol, Beverages
# Transfer Calories and Grams to numeric to calculate calories per gram
nutrients$Calories <- as.numeric(nutrients$Calories)
nutrients$Grams <- as.numeric(nutrients$Grams)
nutrients$calorie_density <- nutrients$Calories / nutrients$Grams

# Use order() to sort the 10% highest calorie_density in nutrients
sorted_cal_density <- nutrients[order(-nutrients$calorie_density), ]
top_10_percent <- head(sorted_cal_density, nrow(sorted_cal_density) * 0.1)

ggplot(top_10_percent, aes(x = calorie_density, y = fct_reorder(Food,calorie_density))) +
  geom_point(color = "cornflowerblue") +
  # theme_bw() is more readable than theme_linedraw()
  theme_bw() +
  labs(x = "calorie_density (calories per gram)", y = NULL,
       title = "10% of Foods with Highest Calorie Density")

One of the abnormalities that is present is that safflower seed oil, olive oil, and corn oil all have the same calorie density. Another anomaly is the fact that this also happens between thousand island sauce and powdered milk.There are also gaps between 7-9 calorie_density.

  1. Determine the food categories with the highest and lowest mean calorie density. Create a Cleveland dot plot of calorie density by food, showing only these two categories. (Facet by Category.) Note any data abnormalities in the plot.
# Use arrange to order rows by specific column 
food_categories <- nutrients %>% 
  drop_na(c("Calories", "Grams")) %>%
  group_by(Category) %>%
  summarise(mean_calorie_density = mean(calorie_density)) %>%
  arrange(mean_calorie_density)

highest_mean_category <- food_categories$Category[food_categories$mean_calorie_density==max(food_categories$mean_calorie_density)]
lowest_mean_category <- food_categories$Category[food_categories$mean_calorie_density==min(food_categories$mean_calorie_density)]

two_food_categories <- nutrients %>%
  filter(Category %in% c(highest_mean_category, lowest_mean_category))

ggplot(two_food_categories, aes(x = calorie_density, y = reorder(Food, calorie_density))) +
  geom_point(color = "cornflowerblue") +
  facet_grid(Category~., scales = "free_y", space = "free_y") +
  labs(title = "Cleveland Dot Plot of Calorie Density by Food", y = NULL) +
  theme_bw() 

Abnormalities: There are two dots for the same butter, clearly the smaller one is an outlier. From nutrients data set, there are three Butter; two of them have calorie_density around 1, one is around 7.4. Despite the outlier, there are gaps between 7 to 9 calorie_density.

  1. Use the same method as above to calculate protein density and carbohydrate density. Create a scatterplot of protein density vs. carbohydrate density faceted by category. To make the plot look better, combine different Fruit categories into one new category and do the same for Vegetables. What can you learn from the plot?
# Use gsub() to replace t with 0
nutrients_no_t <- nutrients %>% drop_na(c("Protein", "Carbs", "Calories")) 
nutrients_no_t$Protein <- gsub("t", "0", nutrients_no_t$Protein)
nutrients_no_t$Carbs<- gsub("t", "0", nutrients_no_t$Carbs)

# Combine different "Fruit" and "Vegetables" categories
nutrients_long <- nutrients_no_t %>%
  mutate(Category = ifelse(Category %in% c("Fruits A-F", "Fruits G-P", "Fruits R-Z"), "Fruit", ifelse(Category %in% c("Vegetables A-E", "Vegetables F-P", "Vegetables R-Z"), "Vegetables", Category)))

# Transfer Calories, Protein, and Carbs to numeric to calculate density
nutrients_long$Calories <- as.numeric(nutrients_long$Calories)
nutrients_long$Carbs <- as.numeric(nutrients_long$Carbs)
nutrients_long$Protein <- as.numeric(nutrients_long$Protein)
nutrients_long$protein_density <- nutrients_long$Protein / nutrients_long$Grams
nutrients_long$carbs_density <- nutrients_long$Carbs / nutrients_long$Grams

ggplot(nutrients_long, aes(x = protein_density, y = carbs_density)) +
  geom_point(color = "cornflowerblue") +
  facet_wrap(~Category ) +
  labs(title = "Scatterplot of Protein Density vs. Carbohydrate Density", 
       x = "Protein Density", 
       y = "Carbohydrate Density") +
  theme_bw(4)

We choose not to use scales = “free” in facetwrap so that all axis are the same and it is easier to compare the different plots. From the plot, we can learn Jams, Jellies, Drinks, Alcohol and Beverage all have 0 protein density, so there are vertical lines at x = 0 in the two plots. There is no strong linear relationship in other categories. For Fats and Fish, when protein density is low, carbohydrate density is low, and when the former is high, the later is high, however there exists large gaps so it’s hard to conclude there are linear relationship in these two categories.

  1. Create an interactive scatterplot of protein calories vs. overall calories which shows the food name when you hover over a point. Explain how you calculated protein calories. If you find clear errors in the data based on this ratio remove the problematic rows and redraw, clearly stately which rows were removed. What are some foods with very high ratios of protein calories to overall calories? With very low ratios?
library(plotly)
nutrients_protein_calories <- nutrients_no_t
nutrients_protein_calories <- nutrients_protein_calories %>%
  mutate(protein_calories = as.numeric(Protein) * 4)

g <- ggplot(nutrients_protein_calories, aes(Calories, protein_calories, label = Food)) + labs(title = "Protein Categories vs Overall Categories with problematic rows") +
  geom_point(color = "cornflowerblue")
ggplotly(g)
# Clear Error: protein_calories shouldn't exceed overall calories, remove the rows and redraw. And there is a negative protein_calories, remove that too.
nutrients_protein_calories_rm <- nutrients_protein_calories %>%
  filter(protein_calories <= Calories)

# Remove negative protein_calories
nutrients_protein_calories_rm <- nutrients_protein_calories_rm %>%
  filter(protein_calories >= 0)

g2 <- ggplot(nutrients_protein_calories_rm, aes(Calories, protein_calories, label = Food)) +
  geom_point(color = "cornflowerblue") + labs(title = "Protein Categories vs Overall Categories without problematic rows")
ggplotly(g2)
# Rows that are removed
rm_rows <- nutrients_protein_calories %>%
  filter(protein_calories > Calories | protein_calories < 0)
print(rm_rows)
          Food   Measure Grams Calories Protein Fat Sat.Fat Fiber Carbs
1       Butter   1/2 cup   112      113     114 115     116   117   118
2       Butter   1/4 lb.   112      113     114 115     116   117   118
3      Oysters  6-8 med.   230      231     232 233     234   235   236
4    Beetroots     1 cup   165        1      12   0             t  0.80
5 French-fried 10 pieces    60      155      -1   7       3   0.4    20
                 Category calorie_density protein_calories
1 Fats, Oils, Shortenings     1.008928571              456
2 Fats, Oils, Shortenings     1.008928571              456
3           Fish, Seafood     1.004347826              928
4          Vegetables A-E     0.006060606               48
5          Vegetables R-Z     2.583333333               -4
# Row 31, 32, 83, 101, 142 in the original nutrients dataset are removed.

Calculate protein calories: In the last few rows(324-335), rows 306-307, rows 295-299, rows 186-192 of nutrients_no_t, we can find all others nutrients except Carbs are close to 0, by calculating the average of Carbs calories 3527/825 ≈ 4. Then we plug Carbs calories into row 2, 7, 20, where all other nutrients except Carbs and Protein are close to 0. 36x + 52y =360, 30x + 42y + m = 290, 38x + 6y + n = 225. Since y, Carbs calories, is close to 4 as calculated above; x, Protein calories, is close to 4. We can plug x = 4 into other rows to verify.

Foods like Stalk raw, Bouillon, Shrimp, Lobster have high ratios of protein calories to overall calories, while Salt, pork, Frozen, Dates have very low ratios.

  1. The National Academy of Medicine in the United States recommends that protein should account for 10%-35% of one’s daily caloric intake. (See: https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/protein/) Suppose you wish to follow this recommendation and to make things simple you want to only eat foods with this percentage of protein. To assist you with this eating plan, create a static version of your graph from part d) in which food points are colored by their protein content as follows:
% calories from protein label
< 10% too little protein
10% - 35% recommended protein
> 35% too much protein

Add lines to serve as boundaries between different colored points. Label some of the points with the food name using ggrepel::geom_text_repel() (It will drop many of the point labels automatically due to overlaps.)

(To be clear, this isn’t a sensible diet! One should consider the percentage of protein overall not on a food by food basis.)

# Calculate the percentage of protein calories
nutrients_percent <- nutrients_protein_calories_rm %>%
  mutate(percentage_protein_calories = (protein_calories / Calories) * 100)

# Use case_when() from dplyr to create a new variable that relies on a complex combination of existing variables
nutrients_labels <- nutrients_percent %>%
  mutate(protein_percent_labels = case_when(
    percentage_protein_calories < 10 ~ "too little protein",
    percentage_protein_calories >= 10 & percentage_protein_calories <= 35 ~ "recommended protein",
    percentage_protein_calories > 35 ~ "too much protein"
  ))
#install.packages('ggrepel')
library(ggrepel)

g3 <- ggplot(nutrients_labels, aes(x = Calories, y = protein_calories, color = protein_percent_labels, label = Food)) +
  geom_point() +
  labs(x = "Calories", y = "protein_calories") +
  geom_text_repel(segment.size = 0.1) +
  # Add boundaries by geom_abline() manually 
  geom_abline(intercept = 1.9, slope = 0.09, col = "black") +
  geom_abline(intercept = 1.9, slope = 0.35, col = "black") +
  # Scale colors by labels manually 
  scale_color_manual(values = c("too little protein" = "#00000030", "recommended protein" = "#ff000050", "too much protein" = "#0180FF50")) +
  theme_bw()

g3

4. Babies

[4 points]

Data: babies in the openintro package

library(openintro)
head(babies)
# A tibble: 6 × 8
   case   bwt gestation parity   age height weight smoke
  <int> <int>     <int>  <int> <int>  <int>  <int> <int>
1     1   120       284      0    27     62    100     0
2     2   113       282      0    33     64    135     0
3     3   128       279      0    28     64    115     1
4     4   123        NA      0    36     69    190     0
5     5   108       282      0    23     67    125     1
6     6   136       286      0    25     62     93     0

For all, adjust parameters to the levels that provide the best views of the data and describe what you see. (It is not necessary to repeat information – for parts b) - d) describe anything new that wasn’t previously visible.)

Draw four plots of bwt vs. gestation with the following variations:

  1. Scatterplot – adjust point size and alpha.
ggplot(babies, aes(y = bwt, x = gestation)) +
  geom_point(size = 1.5, alpha = 0.3) +
  labs(title = "Scatterplot",
       y = "bwt (in ounces)", 
       x = "gestation (in days)") 

There might be positive correlation between gestation and bwt. The points are clustered. There are some outliers, like point around 150 gestation. We choose to put bwt on y axis because it is dependent on gestation. Overall, there seems to be an upward trend with some outliers scattered around.

  1. Scatterplot with density contour lines
ggplot(babies, aes(y = bwt, x = gestation)) +
  geom_point(size = 1.5, alpha = 0.3) +
  geom_density2d() +
  labs(title = "Scatterplot with Density Contour Lines",
       y = "bwt (in ounces)", 
       x = "gestation (in days)") 

Density contour lines clearly show the density in the clustered region, suggesting a strong relationship. Gestation and bwt are more likely to have combination pairs around 250-300 gestation in days, and 80-160 bwt in ounces. The highest concentration seems to be around 275 gestation and 122 bwt.

  1. Hexagonal heatmap of bin counts
ggplot(babies, aes(y = bwt, x = gestation)) +
  geom_hex(binwidth = c(5, 5)) +
  scale_fill_viridis_c() +
  labs(title = "Hexagonal Heatmap of Bin Counts",
       y = "bwt (in ounces)", 
       x = "gestation (in days)") 

Hexagonal heatmap of bin counts uses bins to represent data points. We can see the clear number counts distinguished by colors. The most concentrated part has over 30 counts, which means over 30 data points cluster at around 275 gestation and 122 bwt. The concentration region decreases from over 30 counts to less than 10 counts, and the heatmap explicitly shows outliers.

  1. Square heatmap of bin counts
ggplot(babies, aes(y = bwt, x = gestation)) +
  geom_bin2d(binwidth = c(5, 5)) +
  scale_fill_viridis_c() + 
  labs(title = "Square Heatmap of Bin Counts",
       y = "bwt (in ounces)", 
       x = "gestation (in days)") 

There is not much difference in square and hexagonal heatmap. It seems that with a square heatmap, the colors in the middle density of the heatmap group together, but it is more pronounced in the hexagonal heatmap.